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Abstract 

The  typical  particle  filtering  approximation  error  is  exponentially  dependent  on  the  dimension  of  the 
model.  Therefore,  to  control  this  error,  an  enormous  number  of  particles  are  required,  which  means  a  heavy 
computational  burden  that  is  often  so  great  it  is  simply  prohibitive.  Rebeschini  and  van  Handel  (2013)  con¬ 
sider  particle  filtering  in  a  large-scale  dynamic  random  field.  Through  a  suitable  localisation  operation,  they 
prove  that  a  modified  particle  filtering  algorithm  can  achieve  an  approximation  error  that  is  mostly  inde¬ 
pendent  of  the  problem  dimension.  To  achieve  this  feat,  they  inadvertently  introduce  a  systematic  bias  that 
is  spatially  dependent  (in  that  the  bias  at  one  site  is  dependent  on  the  location  of  that  site).  This  bias  con¬ 
sequently  varies  throughout  field.  In  this  work,  a  simple  extension  to  the  algorithm  of  Rebeschini  and  van 
Handel  is  introduced  which  acts  to  average  this  bias  term  over  each  site  in  the  field  through  a  kind  of  spatial 
smoothing.  It  is  shown  that  for  a  certain  class  of  random  field  it  is  possible  to  achieve  a  completely  spatially 
uniform  bound  on  the  bias  and  that  in  any  general  random  field  the  spatial  inhomogeneity  is  significantly 
reduced  when  compared  to  the  case  in  which  spatial  smoothing  is  not  considered.  While  the  focus  is  on  spa¬ 
tial  averaging  in  this  work,  the  proposed  algorithm  seemingly  exhibits  other  advantageous  properties  such  as 
improved  robustness  and  accuracy  in  those  cases  in  which  the  underlying  dynamic  field  is  time  varying. 


1  Introduction 

Particle  filtering,  as  it  applies  here,  is  a  powerful  technique  for  (recursive)  estimation  and  inference  in  nonlin¬ 
ear  dynamical  state-space  models  subject  to  stochastic  influence.  In  theory,  the  state  of  an  underlying  stochas¬ 
tic  dynamical  system  can  be  recursively  estimated  by  composing  the  posterior  probability  of  the  state  condi¬ 
tioned  on  the  random  observations  as  they  become  available  and  using  a  given  initial  prior  and  the  stochastic 
dynamical  system  model.  This  process  is  known  as  recursive  Bayesian  filtering  jT]  and  it  is  generally  intractable 
in  practice  (HOD-  The  particle  filter  is  an  approximation  of  the  Bayesian  filter  that  employs  random  sampling 
to  represent  the  posterior  where  such  samples  (or  particles)  are  propagated  in  practice  through  (sequential) 
importance  sampling  (9)  that  attempts  to  capture  the  dynamics  of  the  underlying  system  as  well  as  the  likeli¬ 
hood  model  on  the  observations.  A  resampling  step  inserted  into  the  recursion  is  also  crucial  to  avoid  sample 
degeneracy  and  control  the  variance  over  time.  To  date,  the  particle  filter  has  been  well  studied  and  we  point 
to  the  still  relevant  early  work  fTT1[l2l  as  well  as  the  comprehensive  coverage  in  (9JSI  for  further  background. 

The  convergence  of  the  particle  filter  in  time  has  been  well  considered  (8)  [6).  By  convergence  in  time,  we 
mean  that  one  can  show  that  the  approximation  error,  with  respect  to  an  idealised  Bayesian  filter,  and  due 
mostly  to  the  random  sampling,  can  be  controlled  through  time  (6]| .  One  can  even  show  the  filter  approxi¬ 
mation  error  due  to  the  sampling  approximation  remains  bounded  uniformly  in  time  (S,  IT  [7 ] .  Such  results 
provide  significant  grounding  for  the  application  of  particle  filtering  in  numerous  application  domains  [9). 

Despite  substantial  analysis  justifying  use  of  the  particle  filter  in  numerous  applications  of  interest,  a  lim¬ 
itation  to  date  surrounds  application  of  the  particle  filter  in  high- dimensional  estimation  and  inference  prob¬ 
lems  fl6lfT8l[T9l.  The  limiting  factor  is  computational  complexity.  Analysis  in  (3J23J  suggests  that  the  particle 
filter  approximation  error  is  exponential  in  the  dimension  of  the  underlying  (measurement)  model  while  the 
same  error  is  controlled  by  the  number  of  samples  according  to  something  like  the  inverse  square  root.  This 
relationship  is  clearly  exposited  in  fl5l.  The  conclusion  is  simply  that,  an  enormous  number  of  particles  (ex¬ 
ponential  in  the  dimension)  must  be  maintained  if  one  is  to  control  the  estimation  error  at  a  reasonable  level 
when  applying  standard  particle  filter  implementations  in  a  high- dimensional  dynamical  estimation  problem. 
An  enormous  number  of  particles  means  a  heavy  computational  burden  which  is  often  so  large  it  is  prohibitive. 
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The  good  news  is  that  recent  studies  (2J  [3..  14  [15]  imply  that  high- dimensional  particle  filtering  may  be 
feasible  in  particular  applications  and/or  if  one  is  willing  to  accept  a  degree  of  systematic  bias.  In  (2),  the  par¬ 
ticle  filter  is  applied  in  a  static  setting  where  the  objective  is  to  sample  from  some  high-dimensional  target 
distribution.  In  this  case,  through  a  sequence  of  intermediate  and  simpler  distributions,  it  is  shown  that  the 
particle  filter  will  converge  to  a  sampled  representation  of  the  target  distribution  with  a  typical  Monte  Carlo 
error  (inverse  in  the  number  of  particles)  given  a  complexity  on  the  order  of  the  dimension  squared.  Although 
[2]  deals  only,  in  essence,  with  a  static  problem  of  sampling  from  a  fixed  target  distribution,  the  analysis  in¬ 
troduces  a  novel  way  of  thinking  about  high-dimensional  particle  filtering  which  may  carry  over  to  dynamic 
filtering  problems.  Related  work  appears  in  (3) . 

1 . 1  Background:  The  Motivating  Paper 

Rebeschini  and  van  Handel  in  mi  consider  particle  filtering  in  large-scale  dynamic  random  fields.  They  in¬ 
troduce  a  simple  blocked  particle  filter  which  localises  the  filter  to  the  blocks  in  a  partition  of  the  random 
field.  Here,  localisation  means  that  the  particle  filtering  prior  distribution  at  each  time  is  independently  up¬ 
dated/corrected  within  each  block  through  application  of  the  observations  locally  conditioned  on  that  block. 
The  posterior  over  each  block  is  independent  across  blocks  and  the  posterior  over  the  entire  field  is  just  the 
product  of  each  blocked  posterior.  The  real  contribution  of  [14]  is  a  descriptive  and  technical  analysis  that 
shows  the  error  introduced  due  to  the  localisation  procedure  can  be  readily  controlled  if  the  dynamics  of  the 
random  field  at  each  site  are  only  locally  dependent  on  those  sites  within  close  proximity.  The  standard  sam¬ 
pling  approximation  error  is  shown  to  be  exponential  in  only  the  size  of  the  individual  blocks.  The  number 
of  samples /particles  controls  the  sampling  approximation  error  at  the  typical  rate  while  the  error  due  to  the 
localisation  process  is  a  systematic  bias  that  can  only  be  controlled  through  an  increase  in  the  block  size.  Since 
each  block  is  updated  independently,  parallel  implementation  is  readily  applicable  and  the  computational 
burden  may  be  alleviated,  albeit  this  remains  to  be  seen  in  practice.  While  the  results  of  [14]  are  at  the  proof- 
of-concept  stage,  the  idea  is  incredibly  powerful  and  it  provides  the  entire  motivation  for  the  study  herein. 

It  was  just  noted  that  the  error  due  to  the  localisation,  or  blocking,  procedure  discussed  in  Q4]  is  systematic 
and  controlled  only  by  the  block  size.  Actually,  this  is  an  exaggeration  and  the  stated  result  is  significantly  more 
promising.  To  analyse  the  effect  of  the  blocking  procedure,  the  authors  introduce  a  tool  termed  the  decay  of 
correlations  property  which  captures  a  spatial  notion  of  stability,  i.e.  which  measures  how  quickly  dependence 
between  sites  in  the  field  decays  as  a  function  of  the  distance  between  those  sites.  They  show  that  if  a  suitable 
decay  of  correlations  exists,  then  the  error  at  any  site  in  the  field  introduced  due  to  the  blocking  procedure 
alone  is  dependent  primarily  on  that  site’s  distance  to  the  border  of  the  block.  In  other  words,  the  systematic 
bias  introduced  due  to  the  blocking  operator  is  small  at  those  sites  in  the  field  far  removed  from  the  block 
borders,  large  on  the  borders  and  varies  in  between.  The  blocking  bias  is  not  spatially  uniform.  The  error  due 
to  the  sampling  procedure  inherent  to  all  particle  filters  is  still  exponential  in  the  block  size  and  controlled 
by  the  number  of  particles.  The  sampling  approximation  error  implies  one  should  not  seek  excessively  large 
block  sizes.  On  the  other  hand,  small  blocks  imply  the  spatial  inhomogeneity  of  the  total  error  caused  by  the 
blocking  bias  is  exasperated.  Ideally,  one  would  like  an  algorithm  that  maintains  a  spatially  homogeneous 
error  within  each  block.  One  can  then  focus  on  considering,  in  detail,  the  block-size  vs.  error  tradeoff.  The 
authors  in  (TJj  devote  much  discussion  to  the  spatial  inhomogeneity  of  the  blocking  bias  and  its  significance. 

1.2  Contribution 

This  work  details  a  simple,  yet  relevant,  algorithmic  adjustment  to  the  blocked  particle  filter  introduced  in  |T4  j . 
The  idea  is  to  spatially  average  the  bias  caused  by  the  blocking  procedure  by  considering  an  adaptive  sequence 
of  partitions  over  the  random  held  instead  of  a  single  partition.  This  adaptive  blocking  procedure  has  the  effect 
of  spatially  smoothing  the  error  caused  by  any  single  application  of  the  blocking/localisation  operation.  In  a 
certain  class  of  random  fields,  this  smoothing  effect  leads  to  a  completely  spatially  homogeneous  error  bound. 
That  is,  the  bound  on  the  total  filtering  error  at  any  site  in  the  Held  is  completely  independent  of  the  location 
of  that  site.  In  more  general  random  Helds,  this  spatial  averaging  effect  can  lead  to  a  significant  reduction  in 
the  spatial  inhomogeneity  of  the  particle  filter  error  bounds  when  compared  with  [  14  ] .  This  smoothing  effect 
is  of  practical  relevance  in  those  cases  in  which  the  blocks  should  be  kept  relatively  small  for  computational 
reasons  or  in  which  the  dynamics  of  the  random  field  are  less  than  trivially  localised. 

The  results  presented  here  are  largely  at  the  same  proof-of-concept  level  as  those  presented  in  the  moti¬ 
vating  paper  CES-  The  stated  results  in  both  studies  are  of  a  quantitative  nature  that  is  far  from  optimal.  Nev¬ 
ertheless,  the  applicability  of  the  particular  algorithms  is  likely  far  less  restricted  than  a  strict  reading  of  the 
results  would  suggest;  indeed  this  is  seemingly  also  true  for  the  celebrated  time-uniform  particle  filter  con- 


vergence  results  (8) .  It  is  with  this  applicability  in  mind  that  the  algorithmic  extensions  considered  herein  are 
proposed.  The  extensions  considered  are  conceptually  simple,  easy  to  implement,  and  do  not  generally  add 
to  the  computational  burden  of  the  algorithm.  Beyond  the  important  spatial  smoothing  effect  of  the  proposed 
filter,  allowing  for  (adapting)  multiple  partitions  of  the  field  may  also  provide  algorithmic  robustness  in  those 
cases  in  which  the  random  field  is  time-varying  etc  as  the  partitions  can  be  adapted  online,  or  it  may  allow  one 
to  adaptively  focus  computation  on  certain  locations  of  interest  for  periods  of  time  etc.  Other  advantages  of 
partition  adaptation  are  envisioned. 

1.3  Paper  Organization 

The  paper  is  organised  as  follows.  In  Section  2  we  introduce  the  model  and  problem  setup.  In  Section  3  we 
introduce  the  Bayesian  filtering  framework,  the  (standard)  particle  filtering  algorithm  and  we  introduce  the 
adaptively  blocked  particle  filter  for  high- dimensional  estimation  problems.  In  Section  4  we  state  the  main 
result,  which  consists  of  a  time-average  and  spatially  smoothed  total  error  bound  on  the  adaptively  blocked 
particle  filter  approximation  to  the  true  Bayesian  nonlinear  filter.  In  Section  4  we  also  note  that  the  bound  may 
well  be  completely  spatially-uniform  and  we  outline  the  strategy  for  proving  this  result.  In  Section  5  we  explore 
in  more  detail  the  error  bound  and  the  spatial  smoothing  effect  of  adaptively  sequencing  through  partitions  in 
the  blocked  particle  filter.  In  Section  5  we  discuss,  in  more  detail,  the  class  of  random  fields  and  the  sequence 
of  partitions  that  may  lead  to  complete  spatial  uniformity  in  the  error  bound. 

To  this  point,  the  problem  formulation,  algorithm,  convergence  results,  and  the  algorithmic/ convergence 
discussions  are  given.  A  casual  reader  may  stop  at  this  point,  and  take  for  granted  the  convergence  results  and 
the  (increased)  spatial  homogeneity  of  the  filtering  approximation  error  across  the  random  field.  Indeed,  the 
conceptual  simplicity  of  the  algorithm,  and  its  spatial  averaging  property,  may  be  sufficient  to  convince  one  of 
the  general  existence  of  such  a  convergence  result  (given  also  the  results  and  analysis  in  fT4l).  Subsequently, 
the  details  of  this  result,  as  they  are  far  from  optimal,  may  be  of  lesser  significance. 

Going  forward,  in  Section  6  we  provide  the  technical  analysis  leading  to  the  main  result  and  following 
the  proof  strategy  introduced  earlier.  The  proofs  required  in  this  work  largely  overlap  with  those  in  1 14  and 
only  the  required  changes  are  derived  here,  with  reference  made  to  the  motivating  paper  as  often  as  possible. 
Indeed,  we  encourage  all  readers  interested  in  high-dimensional  particle  filtering  to  study  [14]  since  a  very 
descriptive  and  accessible  coverage  of  this  topic  and  the  blocked  particle  filter  is  provided  therein  (prior  to  the 
detailed  technical  analysis  set  out  in  |14|  and  to  which  we  point  as  often  as  possible  to  prove  our  case). 


2  Problem  Setup 

For  simplicity  and  to  ease  comparison  we  borrow  the  problem  scenario  directly  from  fill. 

Consider  a  Polish  state  space  X  with  cr-algebra  SC  and  reference  measure  i//,  and  a  Polish  state  space  Y  with 
a- algebra  SV  and  a  reference  measure  <p.  Introduce  on  X,  a  Markov  chain  (Xn)n>o  with  a  transition  density 
p  :  X  x  X  — *•  IR+  with  respect  to  i//.  Introduce  on  Y,  a  sequence  ( Yn)  „>o  that  is  conditionally  independent  given 
(Xn)n>o  and  has  a  transition  density  g  :  X  x  Y  — ►  IR+  with  respect  to  <p.  We  interpret  (Xn)n>o  as  an  underlying 
dynamical  process  that  is  observed  through  (Yn]„> o-  The  pair  (Xn,  Yn)n> o  is  also  a  Markov  chain. 

Now  suppose  the  state  ( Xn ,  Yn)  at  each  time  n  is  a  random  held  ( X" ,  Y^)veV  indexed  by  a  (finite)  undirected 
graph  G-  ( V,  H)  where  V  corresponds  to  the  set  of  sites  and  E  corresponds  to  the  set  of  edges  that  define  the 
structure  of  the  held.  The  dimension  of  the  model  is  then  at  least  as  big  as  the  cardinality  of  the  vertex  set  V 
and  we  assume  this  to  be  large. 

To  be  more  precise,  the  space  X  and  Y  are  of  product  form  X  =  Ylvev  and  Y  =  Ylvev  Y!'  respectively.  The 
associated  reference  measures  then  given  by  ip  =  0,,el/ 1//"  and  <p  =  <S>V£V(Pv  where  y/v  and  <p "  are  reference 
measures  onX"  and  Y"  respectively.  The  transition  densities  p  and  g  are  given  by 

p{x,z)  =  n  pv{x,zv),  g(x,y )  =  f]  gv{xv,yv ) 
veV  v£V 

where  pv  :  X  x  Xv  — ►  IR+  and  g"  :X"  x  Yv  IR+  are  dehned  with  respect  to  i//"  and  (pv  respectively. 

The  model  assumes  the  observations  {Yn)n>  o  are  completely  local  in  the  sense  that  gv  (x" ,  yv)  dependsonly 
on  xv  or,  in  other  words,  the  conditional  distribution  of  Y"  given  Xn  depends  only  on  X" . 

A  distance  d ( v,  v') ,  that  counts  hops  along  the  shortest  path  between  v,v'  e  V,  is  associated  with  G-  ( V,  E) . 
Now  for  a  fixed  r  £  N  and  for  each  vertex  v  eV  we  define  N[v )  =  {v1  e  V  :  d(v,  v')  <  r}  which  specifies  a  neigh¬ 
bourhood  of  v.  We  then  assume  the  dynamics  of  ( Xn)n>o  are  local  in  the  sense  that  pl'{x,zv)  depends  only 

on  xN{v)  or  in  other  words  the  conditional  distribution  of  X"  given  Xq . Xn-\  depends  only  on  X^1 .  More 

precisely,  the  dynamics  obey  pv(x,  zv)  -  pv[x,zv)  whenever  xNM  =  xiV(w)  where  xJ  -  {x])j£j  for  JqV. 


We  refer  to  the  motivating  paper  [  1 4';|  for  further  discussion  on  such  models  and  the  references  therein  for 
background  of  where  such  models  appear  in  the  literature.  Also,  see  j20J  for  such  modelling  motivation. 

Since  the  process  iXn)n>o  is  not  directly  observable,  the  filtering  problem  of  interest  is  one  of  recursively 

estimating  the  unobserved  state  Xn  given  the  observation  history  Y\ . Yn.  That  is,  the  filtering  problem  is 

one  of  computing 

n^vnXne-\Yl,...,Yn] 

where  is  the  probability  measure  under  which  (Xn ,  Yn)  „>o  is  a  Markov  chain  with  a  transition  probability  P 
that  can  be  factored  as  P((x,  y),  A)  =  f  \a(x' ,y')p(x,x')g(x' ,y')y/(dx')(p(dy')  for  any  A  e  9C  x  <3/  and  where  the 
initial  condition  Xq  ~  pis  an  arbitrary  probability  measure  p  on  X.  Notationally,  nxn  =  . 


3  Adaptively  Blocked  Particle  Filtering 


Firstly,  we  outline  the  ideal  nonlinear  recursive  Bayes  filter,  followed  by  the  standard  bootstrap  particle  fil¬ 
ter.  We  note  briefly  the  computational  problem  involved  in  applying  the  bootstrap  filter  to  high- dimensional 
estimation  problems.  We  then  outline  our  adaptively  blocked  particle  filter  and  note  its  straightforward  rela¬ 
tionship  to  the  algorithm  of  (T4j  and  discuss  generally  the  motivation  for  this  algorithm  as  it  applies  to  filtering 
of  high-dimensional  systems. 

It  is  well  known  that  through  an  application  of  Bayes  rule,  the  filter  can  be  computed  recursively  via 

t Tq=P,  ^  =  F„^_1  [n  >  1) 


where,  it  is  common  for  practical,  as  well  as  conceptual,  reasons  to  define  F n  =  C„  P  where 


(Pp)  (/)  =  J  fix')  pix,  x’)  y/idx1)  p(dx) 


is  a  prediction  in  which  the  filter  estimate 
process  iXn)n>o  and 

(C  np)if) 


is  propagated  forward  using  the  dynamics  of  the  underlying 

A  /  fjx)gix,Yn)pidx) 

J  gix,Yn)  pidx) 


is  a  so-called  correction  (or  update)  in  which  the  predicted  distribution  is  updated  by  conditioning  it  on  the 
observation  Yn  to  obtain  nf.  Graphically, 
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The  recursive  structure  of  the  filter  allows  estimation  of  the  underlying  dynamic  process  to  be  carried  out  ‘on¬ 
line’  over  a  long  time  horizon  and  incorporating  measurements  as  they  become  available.  However,  it  is  well 
known  that  to  this  point  such  a  filter  is  impractical  since  at  the  level  of  arbitrary  probability  measures  it  must 
be  considered  of  infinite  dimension.  In  general,  no  exact  finite  dimensional  nonlinear  filter  can  be  computed. 

One  approximation  to  the  nonlinear  filter  employs  sampling  and  Monte  Carlo  approximation.  Let  N  >  1 
denote  the  number  of  samples  (or  particles)  used  in  the  approximation  and  define  SN  to  be  the  sampling 
operator  which  computes  a  random  measure 


1  N 

Swp  =  —  J^Sxp),  xii)  is  i.i.d.  ~  p 


with  respect  to  some  probability  measure  p.  This  random  measure  is  a  discrete  approximation  of  p  and  con¬ 
verges  to  p  with  X  at  a  typical  rate  of  1  /  \fN. 

The  most  common  and  arguably  the  simplest  Monte  Carlo  approximation  of  nonlinear  filtering  is  given  by 

Kq=P,  =  ^rXn_]  ~  D 
where  F„  =  C„S  A  P  now  consists  of  three  operations 


*  n  prediction 


ln- 1 


rnn- 1 


sampling  ,  u 
- ►  7l„- 
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J  rJLn- 1 


correction 


C  nft 
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This  recursion  yields  the  bootstrap  particle  filtering  algorithm  fTTl.  This  algorithm  is  simple  to  implement  and 
ft„  converges  to  the  exact  filter  7i„  as  N  — *■  oo.  Resampling  and  other  operations  are  typically  incorporated  into 
the  bootstrap  particle  filter  to  improve  performance  (9) . 


If  the  (standard)  bootstrap  particle  filter  is  applied  to  a  system  of  dimension  |V|,  then,  typically  [T6 .  j4j 
CCD,  the  approximation  error  is  exponential  in  |  V"|  and  inversely  proportional  to  something  like  y/N.  If  |  V\  is 
large,  then  one  needs  a  huge  number  of  particles  N  to  achieve  a  desired  error  rate  and  this  requires  a  heavy 
computational  burden.  In  many  applications,  like  target  tracking  (9),  there  are  typically  no  computational 
barriers  to  achieving  an  acceptable  error.  In  large-scale,  high-dimensional,  estimation  problems  the  particle 
filter  is  often  computationally  infeasible  which  motivates  the  study  in  jT4j  and  obviously  in  this  work. 

To  this  end,  we  introduce  a  partition  X  of  the  vertex  set  V  into  non-overlapping  blocks 

KnK'  =  0,  K^K',  K,K'  eXand  H  =  [J  K 


Now  suppose  there  exists  a  finite  number  m  e  N  of  partitions  Xq ,  Xm- 1  of  this  type.  There  exists  a  non¬ 
negative  constant  9  e  IR+  and  a  positive  f)  £  R+  such  that  given  a  positive  /)  e  M+  then  for  every  node  v  e  V  we 
have 

1  m— 1  i  m—1 

9  <  em{v)  =  —  V  d{v,dKdv))  and  0  <  dm{v)  =  —  Y  e^f’,d[u-dKim  <  9 

m  j=o  m  j^O 

where  Kj{v)  e  J Yj  and  we  write  K(u)  when  v  e  K  for  some  K  in  some  X .  Here,  9  is  the  smallest  average 
distance  between  any  site  and  the  borders  dKj{  v)  -  \i/  e  Kj{v ) :  N (u')  Kj[v)}  of  those  blocks  containing  it, 
while  9  captures  a  similar  property  in  a  more  round  about  manner.  We  now  define  1 14  the  blocking  operator 
for  some  partition  X 

B(JT)p=  0  B*p 

where  for  any  measure  p  on  X  =  (g)  veV^V  and  /  c  1/  we  denote  by  B Jp  the  marginal  of  p  on  <g>v€jXv.  The 
random  held  described  by  the  measure  B(^f)p  on  X  is  independent  across  blocks  defined  by  the  partition  X . 
The  adaptively  blocked  particle  filter  adds  a  blocking  operation  into  the  bootstrap  particle  filter  recursion 

Aq  ^n  =  ^n^n-l 

where  F„  =  C,)B(.Xffr(/1j)S  vP  consists  of  four  operations 


prediction  /  sampling 


^  _ 


n— 1 


^PCt 


blocking/  correction 


X  - 


where  a  :  XI  — >■  {0, . . . ,  m  -  1}  is  a  partition  switching  signal.  If  m  -  I  then  the  adaptively  blocked  particle  filter 
reduces  to  the  blocked  particle  filter  considered  in  Qjj .  If  .X)  —  . . .  =  Xm-\  =  { V }  then  the  adaptively  blocked 
particle  filter  reduces  to  the  bootstrap  particle  filter.  The  resulting  algorithm  is  given  in  AlgorithmQ] 


Algorithm  1  Adaptively  Blocked  Particle  Filter 
consider  the  partitions  Xq,  . . . ,  Xm-\ 
let  Aq  =  p 

for  k-  do 

resample  i.i.d.  x^i  (i)  ~  i  —  1, . . . ,  AT 

sample  x^(i)  ~  pv{xic-i(i),xv)y/[dxv),  i  =  1, ... , N,  \/v  e  V 

compute  {i)  ^Uv^K gv /l.f=1Uv£K gv f.xlij),Y^),  i  =  l,...,N,  VK eXa{k) 
let  A^  =  i  k£ (0  SxK{i) 

end  for 


The  only  difference  between  the  adaptively  blocked  particle  hlter  and  the  block  particle  filter  of  [Ljj  is  the 
adaptive  consideration  of  multiple  held  partitions  during  the  execution  of  the  algorithm.  Going  forward,  the 
notation  A„  will  refer  to  the  adaptively  blocked  particle  hlter  of  Algorithm[H 

At  any  time  n,  only  measurements  in  block  K  e  X„(tl)  are  used  to  update  the  hlter  in  block  K.  Each  block 
K  e  Xa(n)  can  therefore  be  updated  in  parallel.  The  complexity  of  updating  each  block  with  a  given  error  is 
thus  dependent  only  on  the  cardinality  of  that  block  and  not  on  the  dimension  of  the  entire  random  held.  If 
the  additional  error  for  the  entire  Hlter  caused  by  the  blocking  approximation  can  be  sufficiently  controlled,  it 
seems  the  curse  of  dimensionality  as  it  applies  to  the  particle  Hlter  in  general  can  be  alleviated. 


It  is  clear  that  at  any  given  iteration  the  blocking  operator  decouples  the  distribution  at  the  boundaries  of 
the  blocks.  In  the  motivating  paper  Qj],  only  a  single  partition  is  considered  and  it  follows  that  the  filtering 
approximation  error  will  be  larger  at  those  vertices  close  to  the  boundary  of  each  block  than  at  those  sites  to¬ 
ward  the  centre  of  each  block.  The  result  is  a  spatially  non-homogeneous  filtering  error  and  indeed  the  error 
bound  derived  in  the  motivating  paper  [14J]  is  spatially  dependent.  A  comprehensive  and  insightful  discussion 
on  this  problem  is  provided  in  [!l4] .  By  adaptively  applying  different  partitions  one  may  ensure  that,  on  aver¬ 
age,  each  site  of  the  random  field  is  (at  least  approximately)  the  same  distance  from  the  borders  of  the  blocks 
during  some  cycle.  One  would  hope  then  that  the  error  is,  on  average,  spatially  homogeneous  and  that  one 
can  achieve  an  error  bound  that  is  site  independent.  We  note  with  this  in  mind  the  important  special  case  of 
Algorithm[T|in  which  o{ri)  =  n  (mod  m )  or  where  the  partitions  are  applied  in  a  cyclical  order. 

We  stress  that  beyond  the  possibility  of  spatial  smoothing,  allowing  for  multiple  partitions  of  the  field  may 
also  provide  algorithmic  robustness  in  those  cases  in  which  the  random  field  is  time-varying  since  the  par¬ 
titions  can  be  adapted  online,  or  it  may  allow  one  to  adaptively  focus  computation  on  certain  locations  of 
interest  for  periods  of  time  etc.  The  design  of  o  offers  seemingly  much  flexibility  and  other  advantages  of  par¬ 
tition  adaptation  are  envisioned  but  the  discussion  here  will  focus  on  spatial  averaging  of  the  blocking  bias. 


4  The  Main  Results 


As  in  [14]  we  define  the  following  norm 

\\\p-p'\\\j=  sup  E[|p(/)-p'(/)|2]1/_ 

/€*T|/|<1 

between  two  random  measures  p  and  p'  on  X  where  X1  is  the  class  of  all  measurable  functions  /  :  X  — *■  IR 
with  fix]  -  f[5c )  whenever  xJ  -  ix^]j€j  for  J  QV.  For  example,  one  then  finds  |||p-  S^pHly  <  Vs/N  which 
captures  the  typical  Monte  Carlo  approximation  error.  The  goal  is  to  bound  the  error  between  the  nominal 
(ideal)  Bayesian  filter  and  the  adaptively  blocked  particle  filter  A„.  Recall  that  both  the  ideal  filter  and  the 
adaptively  blocked  particle  filter  are  defined  recursively 

nn  ~  Fn"‘  FpU,  =  Fn  -  -  -  Fi  p 

where  F„  =  C „  P  and  Fn  =  Cn  B(^fcr(n))SivP.  From  the  triangle  inequality  we  get 

III  HI  j  <  HI  -  A^  HI  j  +  HI  -  A %  |||  j 

for  some  /  c  V  where 

An  =  Fn  •••  Fip 

with  F ji  =  C„B(^cr(H))P  is  an  ideal  adaptively  blocked  filter;  i.e.  considering  the  adaptive  blocking  operation 
as  it  applies  to  the  (non- sampled  version  of  the)  ideal  Bayesian  filter. 

The  expectation  appearing  in  the  definition  of  |||  •  |||  j  is  taken  only  with  respect  to  the  random  sampling  S  v ; 
see  nn.  Hence, 

lll^n  —  An  III  /  =  Slip  E[|^(/)-7f^(/)|2]1/_  =  SUp  \jl%if)  -  A%if)\ 

/e£T|/|<l  /eSH/lsl 

since  no  sampling  occurs  in  7i„  or  in  A„  and  in  this  case  |||  •  |||  /  defines  a  local  version  of  the  total  variation  Qjjj 
which,  as  in  m,  we  sometimes  denote  by  ||  •  11/  for  JQ  V. 

The  first  term  in  this  decomposition  quantifies  the  bias  introduced  by  the  blocking  operation  alone.  The 
second  term  quantifies  the  error  due  to  the  variance  of  the  random  sampling.  Typical  analysis  on  the  conver¬ 
gence  of  the  bootstrap  particle  filter  deals  only  with  a  variance  term  (since  A „  -  n,t  in  that  case) . 

Let  A  =  max„Ey  card{j/  e  V :  div,  v']  <  r}  and  =  maxs maxjc e cardifiT'  e  AAAs  :  diK,K')  <  r }.  Also  define 

Ad(t/)  =maxsdiv,dKs],  Ad  =max„  Arffy)  and  Vd(b)  =  min sdiv,dKs).  Finally,  let  |  JC 1^  =  maxs maxjf e^rs  1^1- 

Theorem  1  (Bounding  the  Variance).  There  exists  a  constants  <  eq  <  1,  depending  only  on  A  and  A,^  such  that 
the  following  holds.  Suppose  there  exist  eq  <  e  <  1  and  0  <  k  <  1  such  that 

£<pvix,zv]<£~1,  K<gvixv,yv)<K~1  Vve  V,  x.zcX,  ye  Y 


Then  for  every  n  >  0,  x  e  X,  and  v  e  V  we  have 


III  An  Ayt 


<  a 


Wc 


Vn 


where  0  <  a,  f  <  oo  depend  only  on£,K,  r,  A  and  . 


The  variance  depends  on  the  dimension  of  the  sampling  and  so  is  necessarily  dependent  on  the  size  of  the 
blocks.  This  is  exactly  what  is  expected  (5J3]|  of  the  variance  in  the  sense  that  it  recovers  the  behaviour  of  the 
standard  bootstrap  particle  filter  (without  blocking)  which  is  dependent  on  the  size  of  the  entire  field. 

The  blocking  operation  essentially  reduces  the  large-scale  filtering  problem  to  one  of  multiple,  smaller, 
independent  filtering  problems  and  on  which  each  independent  particle  filter  mostly  exhibits  an  error  that  is 
well  understood  {8|l4j.  Since  the  general  nature  (not  the  specific  constants)  of  the  variance  bound  is  the  best 
one  might  expect,  we  will  not  focus  on  this  bound  going  forward. 

The  main  error  component  of  relevant  interest  here  is  that  component  introduced  purely  as  a  result  of 
the  blocking  operation  (and  not  the  sampling).  Therefore,  going  forward  we  are  largely  concerned  with  = 
Fn  •  ••  Fi/i  where  Fn  =  CnBtZfa(n))P  and  its  ability  to  approximate  the  ideal,  full,  Bayesian  filter  Fn  •  •  •  F]  p. 
The  particle  filter  A1^  in  this  case  can  be  thought  of  as  an  approximation  of  the  ideal  blocked  filter  A^  and  it  is 
worth  noting  that  other  approximations  to  A„  separate  to  particle-based  approximations  could  be  substituted. 

In  summary,  the  main  contribution  is  reduced  to  a  study  on  adaptively  blocked  filtering  and  the  error 
introduced  through  adaptive  blocking  when  compared  to  the  ideal  Bayesian  filter.  The  particle  filtering  step 
is  given  to  show  how  one  may  approximate  the  adaptively  blocked  filter  in  practice  and  the  error  given  on  this 
particle  representation  is  noted  for  completeness  (this  error  is  as  expected  even  if  it  is  non-trivial  to  derive). 

The  bound  on  the  bias  introduced  due  to  blocking  is  now  stated. 

Theorem  2  (Bounding  the  Bias).  Suppose  that  e  <  pl'{x,zv)  <  e~l  for  all  v  e  V  and  x,ze  X  with  e  >  Eo  - 
(l  -  1/(18A2))1/2A.  Let  p~  -(2r)_1logl8A2(l  -  e2A)  >  0.  If  <j{s)  -  s  (modm),  seN  then  for  every  veV  we  have 
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for  every  n  >  0  andxe  X  whereOm[v)  -  g1  d{v,dKj{v ))  anddm{v )  =  e  Mhi.dKpvt)  anciKj(v)  £ 

This  is  a  time-uniform  bound  on  the  average  bias  over  a  time  length  of  m.  Both  inequalities  in  Theorem[2] 
imply  that  the  bias  introduced  due  to  blocking  can  be  spatially  averaged  (smoothed)  across  a  cyclical  applica¬ 
tion  of  a  sequence  of  partitions.  Both  inequalities  collapse  to  the  result  of  fJ4j  in  the  case  m  =  1.  The  second 
inequality,  in  general,  over  bounds  the  first  inequality  but  may  be  more  convenient  for  discussion  as  6m[v ) 
may  be  easier  than  i)m{v)  to  conceptualise.  Following  the  analysis  in  1141.  the  goal  was  to  derive  a  similarly 
natured  bound  here,  but  which  captured  honestly  the  spatial  smoothing  effect. 

The  spatial  invariance  of  the  error  bound  (or  more  specifically  the  bias  bound)  will  be  discussed  in  more 
detail  in  the  next  section.  We  simply  note  here  that  if  6  -  0m{v)  =  d(v,dKj{v))  where  Kj ( v)  e  JCj  for  all 

v  e  Y,  then  the  bound  really  is  spatially  invariant.  Such  a  situation  occurs  for  a  particular  class  of  graphs  (i.e. 
random  fields)  discussed  later.  The  more  general  case  in  which  9  <  ^  XJL"1  d{v,dKj{v))  is  also  discussed. 

A  simple  corollary  follows  in  which  there  exists  an  ordering  of  J£o,...,  such  that  for  a  cyclical  se¬ 

quence  of  partitions  o(s)-s  (mod  m),  se  Nwe  have 


U-K-k I-  s  cT^jd <  ^(l-^)exp 


r>2A 


8e~P 


-pe-mM-ydmL6m[v) 
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for  some  (at  least  one)  veV. 

The  blocking  operation  contributes  the  bias  term  to  the  total  error  while  the  random  sampling  contributes 
the  variance  term.  As  previously  noted,  the  nature  of  the  variance  bound  is  as  expected  and  we  do  not  focus  on 
that  going  forward.  The  bias  is  determined  by  the  blocking  operator  which  conceptually,  at  any  time,  has  little 
effect  on  those  sites  far  removed  from  the  block  borders  due  to  the  local  dynamical  dependencies  assumed. 
Consequently,  the  bias  is  controlled  at  a  sub-block  level  in  that  the  bias  at  each  site  is  controlled,  on  average, 
by  that  site’s  distance  to  the  border  of  the  blocks  which  contain  it.  If  we  can  average  this  distance  across  the 
field  through  adaptive  partitioning  then  we  should  be  averaging  the  bias  across  the  field. 

From  the  computational  view  point,  the  variance  bound  implies  that  one  need  only  consider  the  size  of 
the  blocks  (not  the  entire  random  field)  when  picking  a  value  for  N  to  control  the  error.  This  (ideally)  leads  to 
a  reduction  in  the  computational  requirements  of  the  filtering  problem  and  is  the  underlying  motivation  for 
blocking.  This  gain  comes  at  a  price,  in  the  form  of  a  bias  introduced  due  to  blocking.  Here  we  will  consider 
adaptive  blocking  as  a  way  of  smoothing  the  bias  error  over  the  random  field  or  controlling  the  bias  in  a  more 
precise  way. 

Finally,  we  refer  to  [14j  for  a  discussion  on  the  mixing  assumption  e  <  pv[x,zv)  <  e_1  with  the  non-standard 
requirement  £o  <  £  <  1  with  £o  >  0.  The  restrictions  on  £o  <  £  <  1  are  relaxed  partially  in  fT5l.  This  term  does 
not  alter  the  significance  of  the  result  at  the  proof-of-concept  stage.  Moreover,  non-optimal  restrictions  on  the 
system  model  in  this  form  appear  frequently  in  similar  studies;  e.g.  see  (8j.  Typically,  the  empirical  evidence 
suggests  a  far  more  relaxed  application  of  the  algorithms  in  question  is  permissible. 


4. 1  Proof  Strategy 

The  bias  and  variance  bounds  are  treated  separately  but  lead  to  a  total  error  bound.  The  proof  strategy  is 
adopted  from  Rebeschini  and  van  Handel  ||T3)  and  much  of  the  analysis  required  is  identical  and  not  repeated. 
The  strategy  in  [14]  is  inspired  in  part  by  the  time-uniform  particle  filter  convergence  results  [7] . 

In  the  case  of  the  bias  |||  -  ft^  |||  j,  one  first  derives  a  local  stability  property  for  the  filter  which  implies 

that  the  marginal  over  a  local  setjQV  of  the  initial  state  fi  is  forgotten  exponentially  fast.  Such  a  property  also 
implies  that  any  approximation  errors  in,  say,  the  initial  state  are  also  forgotten.  It  then  follows  that  if  one  can 
bound  the  one-step  approximation  error  |||  Fn^n-i  ~  ^”™n- 1 W  J  at  anY  time,  then  in  conjunction  with  the  local 
stability  property  one  will  obtain  a  time-uniform  bound  on  the  bias  over  a  local  region  of  the  field. 

In  the  case  of  the  variance  |||  -  ft  „  |||  /,  a  similar  idea  is  used  except  one  first  establishes  stability  for  the  ideal 
adaptively  blocked  filter  ft^.  Then,  one  must  bound  the  one-step  approximation  error  |||  F  nn^n_ }  |||  j  at 

any  time.  Putting  the  stability  property  and  the  bound  on  the  one-step  approximation  together,  one  achieves 
the  desired  time-uniform  bound  on  the  variance  of  a  block  in  the  adaptively  blocked  filter. 

We  have  obviously  glossed  over  much  of  the  intricacies  involved  in  the  proof  in  this  summary.  For  exam¬ 
ple,  in  the  case  of  the  bias,  the  property  introduced  in  [14j  and  referred  to  as  the  decay  of  correlations  must  be 
established  to  hold  uniformly  in  time  for  the  ideal  block  filter  This  property  captures  a  notion  of  spatial 
stability  where  the  state  at  some  site  in  the  random  field  is  forgotten  as  one  moves  away  from  that  site.  Rebes¬ 
chini  et  al.  provide  a  novel  measure  of  this  decay  that  allows  them  to  establish  local  stability  of  the  filter  and 
to  establish  a  bound  on  the  one-step  approximation  error  |||  -  F 1||/.  Conceptually,  a  property  like 

the  decay  of  correlations  is  necessary  to  establish  such  results. 

We  refer  the  reader  to  QD  for  a  broader,  more  insightful,  discussion  on  the  strategy.  Now,  we  note  specifi¬ 
cally  what  ideas  must  be  altered  to  account  for  a  change  in  partition  from  one  time  to  the  next. 

4.1.1  Steps  to  Prove  the  Bias  Bound 

Very  roughly  speaking  the  steps  needed  to  prove  the  bound  on  the  bias  include:  1).  Establishing  the  local 
stability  property  for  the  nonlinear  filter  7r„;  and,  2).  establishing  the  decay  of  correlations  property  holds 
uniformly  in  time  for  the  ideal  adaptively  blocked  filter  tt„;  and  then,  3).  establishing  a  bound  on  the  one-step 
approximation  error  |||  F nH„_1  -  |||  j  that  holds  at  any  time;  and  finally,  4)  putting  it  all  together. 

The  local  stability  of  n ^  depends  on  the  decay  of  correlations  property  assumed  on  p  and  is  otherwise  inde¬ 
pendent  of  the  blocking  procedure.  Hence,  we  can  take  this  result  as  a  given  |T4] .  The  one-step  approximation 
error  |||  F n^t_i  ~  F n^n_i  III /  is  dependent  on  the  blocking  procedure  but  only  on  the  partition  in  effect  during  a 
single  time  step,  and  this  partition  is  otherwise  arbitrary,  so  we  can  take  this  result  as  given  1141. 

To  prove  our  case,  we  only  need  to  establish  that  the  decay  of  correlations  property  holds  uniformly  in  time 
for  the  filter  distribution  ft^  when  given  the  changing  partitions.  Once  this  is  established,  it  is  just  a  matter  of 
collecting  the  relevant  results  and  finalising  the  bound  on  |||  -  ft ^Wl /■  We  follow  through  with  this  last  step 

and  show  how  the  spatial  averaging  effect  of  Qm(v)  -  HJLq  d[v,dKj{v ))  comes  out  during  this  procedure. 

The  detailed  proof  is  given  in  a  subsequent  section  drawing  from  [14)  as  often  as  possible. 

4.1.2  Steps  to  Prove  the  Variance  Bound 

Roughly  again,  the  steps  needed  to  prove  the  bound  on  the  variance  include:  1).  Establishing  the  stability 
of  the  ideal  adaptively  blocked  filter  7r^;  and,  2).  establishing  a  bound  on  the  one-step  approximation  error 
III  Fnjt^_i  -  F/,if^_|  III /  that  holds  at  any  time;  and  finally,  3).  putting  it  all  together. 

Firstly,  we  do  not  have  to  deal  with  any  correlation-like  properties  in  the  case  of  the  variance  bound  and  as 
noted  the  final  result  is  as  expected.  So  things  may  appear  simpler  initially.  Unfortunately,  proving  stability  for 
the  ideal  adaptively  blocked  filter  ft^  is  not  trivial  [14] .  Because  the  stability  of  is  dependent  on  the  change 
of  partition  we  must  re-establish  that  this  stability  result  holds  in  the  case  of  adaptively  changing  partitions 
for  completeness.  Moreover,  for  technical  reasons  related  to  the  use  of  the  norm  |||  |||  / ,  the  authors  in  [14] 
consider  instead  a  two-step  approximation  error  |||  -  Fn+i  ^  III  j  and  bound  this  term  at  any 

time.  Because  a  change  in  partition  comes  into  play  over  two  steps,  we  must  re-establish  that  this  two-step 
approximation  error  is  bounded  under  a  change  of  partition  at  any  time.  We  then  bring  the  relevant  results 
together  and  finalise  the  bound  on  |||7T^  -  7r)[|||  /. 

As  noted,  this  is  the  strategy  taken  to  derive  the  variance  bound  but,  since  our  main  concern  here  is  the 
spatial  aspects  of  the  filtering  problem  and  the  related  (adaptive)  blocking  operation,  we  do  not  give  the  details. 
Many  of  the  technical  lemmas  involved  in  this  analysis  also  follow  directly  from  [T4  j  and  those  results  requiring 
a  modification  to  their  proofs  need  only  an  arguably  minor  re-analysis  and  modification.  The  final  result  is  as 
expected  and  the  authors  are  available  to  provide  the  variance  bound  proof  details  on  request. 


5  Discussion  on  the  Adaptively  Blocked  Filter  and  Spatial  Smoothing 

The  main  result  in  the  previous  section  is  a  total  error  bound  on  -  jf^|||y,  for  all  v  e  V  and  can  be  de¬ 
composed  (and  is  actually  derived)  in  terms  of  a  bound  on  the  variance  (induced  by  the  random  Monte  Carlo 
procedure)  and  a  bound  on  the  bias  (induced  by  the  blocking  operator).  We  focus  on  the  bias  bound  in  this 
section  and  its  relevance  as  it  pertains  to  the  dependence  of  the  total  error  on  the  particular  spatial  site  v . 

The  main  point  of  interest  in  this  work  is  the  effect  of  the  (adaptive)  blocking  operation  on  the  total  error 
bound  which  shows  up  purely  via  the  systematic  bias.  To  this  end,  we  compare  the  bound  on  the  bias  proposed 
here  with  the  bias  bound  proposed  in  the  motivating  paper  (14]l  by  Rebeschini  et  al., 

biasB ( v)  <  ZfS0'  e~d{v'dKJm)  =  0(dm(.v))  <  d{v,dKiW))  =  ©(e~dmW)  Bertoli  et  al. 

biasR  ( v)  <  0{e~d{v’mv)) )  Rebeschini  et  al. 

where  Rebeschini  et  al.  only  ever  consider  a  single  partition.  Here,  biasgtR)  and  biasR (v)  can  be  taken  as  the 
average  of  the  bias  over  a  time  period  of  length  in.  We  remove  any  unnecessary  constants  from  the  expressions 
that  cloud  the  conceptual  discussion.  Here,  we  use  ©(■)  to  capture  only  that  spatially  dependent  component  of 
the  bias  bound  noting  that  the  constant  ‘out-the-front’  is  equivalent  in  both  cases  |T4j|.  For  conceptual,  rather 
than  technical,  reasons  we  consider  the  slightly  looser  bound  biasis(r’)  <  ©{e~emW)  during  discussion. 

We  highlight  that  d{v,dKj{v))  for  all  vef  then 

0{e™'LIi‘=°d{v’aK’W))=©{e~dmW)  =  0(e-e) 


and  the  bound  is  truly  spatially  invariant.  This  is  part  of  the  motivation  for  this  work  and  is  explored  in  more 
detail  now.  Consider  FigureQ] 


Figure  1:  The  left  most  circulant  graph  depicts  the  underlying  graphical  structure  of  the  random  field.  The  re¬ 
maining  five  graphs  from  second  left  to  far  right  highlight  a  sequence  of  independent  partitions  of  the  original 
graph  in  which  each  partition  consists  of  a  set  of  two  (dark  grey)  and  three  (light  grey)  vertices. 

Pick  any  site  v  eV  in  the  graph  depicted  in  FigureQ]and  note  that  0  -  0m(u)  =  1/5.  One  then  clearly  has  a 
spatially  uniform  bound  on  biasR  (f).  Consider  now  any  single  partition  alone  and  note  that  for  four  out  of  the 
five  sites  we  have  d{v,dK{v))  -  0  and  at  one  site  we  have  d(v,dK(u))  =  1  which  implies,  as  noted  by  Rebeschini 
et  al.,  that  the  bound  on  biasR(b)  is  not  spatially  uniform.  The  adaptive  blocking  procedure  is  averaging  the 
distance  d{v,dKj{v ))  through  the  use  of  multiple  partitions  which  results  in  a  kind  of  spatial  error  smoothing. 

The  bound  on  the  bias  of  the  cyclically  blocked  filter  at  every  site  v  e  V  is  completely  independent  of  the 
site  u  in  every  case  in  which  0  -  ()m(v),  V  v  e  V.  Such  cases  may  occur  in  practice;  e.g.  a  sequence  of  partitions 
on  any  regular  lattice  wrapped  on  a  torus  can  be  derived  that  obeys  this  property,  see  Figure[2] 


Figure  2:  A  lattice  wrapped  on  a  torus  with  no  physical  boundaries. 


In  the  general  case,  in  which  9  <9m(v)  for  some  v  e  V ,  the  spatial  smoothing  property  of  the  cyclical  block¬ 
ing  filter  is  still  in  effect  and  reduces,  as  compared  to  Qj],  the  degree  of  spatial  inhomogeneity  (on  average)  as 
it  applies  to  the  bias  bound.  Essentially,  the  sites  on  the  borders  of  a  block  in  J6  are  typically  not  on  the  borders 
of  a  block  in  JC' ,  while  the  sites  at  the  centre  of  a  block  in  JC  are  typically  not  at  the  centre  of  a  block  in  Jt ' . 
Given  a  sufficient  number  of  well-chosen  partitions  of  this  type,  then  one  can  ensure  the  average  distance  of  a 
site  to  a  border  is  smoothed  (or  spatially  averaged)  across  all  sites.  Of  course,  this  means  that  some  particular 
sites  may  be  worse  off  than  they  were  under  a  single  partition  (this  is  an  obvious  consequence  averaging). 

For  example,  consider  again  the  case  in  Figure  |Tj  but  suppose  only  the  four  left  most  partitions  are  em¬ 
ployed  by  the  cyclically  blocked  particle  filter.  Then  9m(v o)  -9m{vi)-  Qm[v 3)  =  9m[v 4)  =  1/4  while  0m{i>2)  —  0. 
Clearly,  one  has  a  more  desirable  bound  on  the  bias  on  average  in  this  case  than  in  the  case  in  which  only  a 
single  partition  is  considered,  albeit  complete  spatial  homogeneity  is  not  achieved.  Considering  additional 
partitions  in  a  large-scale  random  field  will  be  of  even  further  benefit  than  that  exposited  in  this  toy  example. 

Note  finally  that  in  both  Rebeschini  et  al.  and  here  the  spatial  uniformity  of  the  error  (or  the  bias  more 
specifically)  is  often  referred  to  via  the  bound  on  the  bias  and  not  the  bias  or  error  itself.  This  is  a  consequence 
of  the  technical  analysis,  but  for  all  practical  purposes  it  would  appear  obvious  that  the  spatial  homogeneity 
of  the  error  itself  is  of  the  same  nature  as  that  noted  by  the  bound  applicable  to  that  error  (even  if  such  bounds 
are  otherwise  quite  loose).  That  is,  the  blocked  filter  of  Rebeschini  et  al.  114]  would  clearly  seem  to  favour 
those  sites  far  from  the  border  of  the  individual  blocks  in  terms  of  the  actual  performance  of  the  filter,  while 
the  cyclically  blocked  filter  proposed  in  this  work  is  clearly,  in  some  sense,  averaging  out  this  favouritism  and 
its  effect  on  the  actual  filter  performance  at  any  site.  The  point  is  that  the  spatial  relationship  of  the  error  is 
typically  noted  in  terms  of  the  bias  bounds  but  intuitively/ conceptually  the  discussions  on  homogeneity  (or 
inhomogeneity)  of  a  particular  filter  apply  (seemingly)  also  to  the  error/bias  itself. 


6  Proof  of  the  Bias  Bound 


Rebeschini  et  al.  M\  introduced  an  important  concept  referred  to  as  the  decay  of  correlation  which  captures, 
in  a  very  technical  manner,  the  intuitive  notion  of  spatial  stability  where  the  state  at  some  site  in  the  random 
field  should  be  forgotten  as  one  moves  away  from  that  site.  This  notion  plays  a  crucial  role  in  the  convergence 
of  the  bias  due  to  the  blocking  operation. 

It  is  important  to  note  that  the  analysis  and  the  spatial  stability  property  put  forth  in  [14)  is  based  in  part 
on  those  ideas  of  temporal  stability  introduced  in  [8j  and  used  to  establish  time-uniform  convergence  results 
for  the  standard  bootstrap  particle  filter. 

Recall  that  the  dynamics  of  the  underlying  process  ( Xn)n>o  are  local  in  the  sense  that  pv[x,zv )  depends 
only  on  xNW  where  x J  —  {x])j^j  for  JqV.  Here,  N(v )  =  {v1  e  V :  d{v,  1/)  <  r }  for  some  reM  captures  the  local 
neighbourhood  of  sites  on  which  site  v  explicitly  depends. 

We  now  briefly  review  the  measure  introduced  in  [14]  on  the  decay  of  correlations  property.  For  any  prob¬ 
ability  measure  p  on  X  and  for  x,z  eX  with  veV  define 


72-1 


Xn  =  z]  = 


flA{xV)  Yluemv)  PUiX,  Zu)pvx(dxv) 

fY\u<LNWPU(X,Zu)llVx{dxv) 


for  any  A  e  3C  and  where 
Then 


pvx{dxv)  4  =  dxv \X™  =  xy'w) 


A  - 

C  ,  =  -sup 

vv'  2  ^ 


sup 

ZGX  x,xeX  :  xv^=xv^ 


II  Px,z  -  II 


for  v,  v'  e  V.  This  quantity  C^v,  somehow  captures  the  correlation  between  two  sites  v,  v'  e  V  in  the  random 
field  under  the  assumed  field  model.  A  little  more  precisely,  this  term  is  measuring  the  maximal  total  variation 
at  a  site  v  that  may  arise  due  to  a  perturbation  at  site  u' .  Now  define 


Corr(p,  ft)  =  max  £ 

veV  ftv 


with  p  >  0.  This  quantity  Corr(p,/3)  is  a  measure  on  the  total  degree  of  correlation  decay  for  the  measure  p 
given  a  rate  parameter  p.  The  site  v  can  be  interpreted  as  the  most  sensitive  site  in  the  field.  To  understand 
this  quantity  Corr(p,  P)  a  litde  more  conceptually,  suppose  the  most  sensitive  site  v  is  known  a  priori.  Then 
suppose  that  Xi/ev  etidtv,u  * Cd;u,  —  1.  It  follows  that  the  correlation  between  any  two  sites  in  the  random  field 
decays  as  a  function  of  the  distance  between  those  two  sites  at  an  exponential  rate  defined  by  at  least  p. 


With  Corr(/i,/3)  defined  as  such  we  borrow  directly  from  [14]  the  local  stability  result  on  which  requires 
only  that  the  initial  condition  p  satisfy  a  decay  of  correlations  property.  Recall  that  A  =  max„ei/  card{z/  e  V  : 
d{v,  v')  <  r }  defines  the  size  of  the  largest  neighbourhood  in  V. 

Lemma  1  (Local  Filter  Stability  [H]).  Suppose  there  exists  e  >  0  such  thate  <  pv{x,zv)  <  e_1  for  all  v  eV  and 
x,z  e  X.  Let  p,  v  be  probability  measures  on  X,  and  suppose  that 

Corr(p,  f)  <  i  -  3(1  -  e2A)e2f>r A2 

for  a  sufficiently  small  constant  p  >  0.  Then 

III F„  •  •  •  F s+ip- F „  •  •  •  Fi+1  v| j  <  2e~^  £  maxe'^1^1  sup  |||M"'Z -  <'z ||| 

V£jV'£V  X,ZEX 


for  every  J  c  V  and  s<  n. 

Now  consider  any  partition  JFi .  We  define  a  correlation  depending  on  the  given  partition.  Fix  a  probability 
measure  p  on  X  and  x,z  e  X,  veV,  K  e  JFi  and  then  let 


pvx[ f  (A)  4  e  =  xy'{I/1,X,f  =  . 


/  Ia(jc^)  riijeiV(^)nX  P“(*.  zll)pvx{dxv ) 
/  n«€jV(i>)nX'  P“  (*>  (dxv) 


for  any  A  e  ,3f.  Now  define 


=  -  max 

2  KzXi 


sup  sup  il/4;f-/4fil 

zeX  x.xeX  : 


and 


Corrj^.  (jU,  /3)  = 


veV 


E 

i/eV 


0Pd.(v,v') 


for  any  partition  in  the  cyclically  blocked  particle  filter  sequence.  One  can  interpret  the  block  adapted 
measure  Corr, ^  (p,  fi)  in  much  the  same  way  as  Corrfp,  /J). 

The  reason  behind  formulating  the  measure  Corr  {p,  p)  in  such  a  way  is  technical  and  follows  from  the  pro¬ 
gram  put  forth  in  [14] .  Here  we  are  only  modifying  this  program  to  account  for  changing  partitions  and  in 
doing  so  we  seek  to  draw  on  the  detailed  analysis  put  forth  in  [14]  as  much  as  possible.  The  reason  for  intro¬ 
ducing  Corrj^  {p,  P)  is  now  explained.  In  order  to  establish  the  one-step  approximation  error  we  must  bound 
Con{ft^,p)  uniformly  in  time.  The  authors  of  []4]  note  the  difficulty  in  working  directly  with  Corr(jr^,  P)  and 
instead  propose  to  bound  Corr.;^^  p)  and  then  use  the  following  result  to  indirectly  control  Corr(jr 


Lemma  2  ([14]).  For  any  probability  measure  v,  partition  JFi  and  p  >  0,  we  have 


Con{v,p)  <  (l-£2A)e2^A2  +  2c_2ACorr^.(v,/3) 


Given  this  result  it  follows  that  a  time-uniform  bound  on  Corr^r(nl  (7r„,/3)  leads  easily  to  a  time-uniform 
bound  on  Corr(7r^,/3). 

Because  fi^  is  dependent  on  the  changing  partitions,  which  is  central  to  the  adaptively  blocked  filter,  it 
follows  that  we  must  re-establish  that  a  time-uniform  bound  on  Con:jra(n)  (jr„,/3)  exists  in  the  case  of  interest 
here. 

We  prove  a  technical  lemma  that  will  be  used  subsequently. 

Lemma  3.  For  any  strictly  positive  a,  b  eU+  andxeU+  withx>  1  the  following  holds:  \ax-blx\  <  \a-b\x2. 


Proof  If  a  >  b  or  even  ax>blx  then 


b  b  xf  - 

|  ax - 1  =  ax-  bx+  bx - =  \a-b\x+b - 

X  XX 


<  | a—  b\x  <  | a-  b\x2 


Note  ax  >  b/x  implies  x  >  \Jb!  a  for  any  a,b,x>  0.  This  leaves  the  case  b>  a  and  1  <  x  <  \fbia.  In  this  case 


|  ax2 


fo|—  <  |ax2-b|  <\a-  b\  <\a- b\x2 
x 


by  continuity  in  1  <  x  <  vbl  a  for  any  fixed  b  >  a  >  0. 


□ 


The  following  proposition  relates  Corr.  (p,  p)  under  one  partition  to  the  same  quantity  considered  under 
a  different  partition  and  is  needed  to  obtain  the  time-uniform  bound  on  the  Conj^aM  (fi,vP)- 

Proposition  1.  Suppose  there  exists  e  >  0  such  thate  <  pv(x,zv )  <  e_1.  For  any  probability  measure  p,  partitions 
and  p  >  0,  we  have 

torjfr,.  (p,  P)  <  e“8Ac5:Frj r;  (p,  P) 

Proof.  Pick  Kj  e  .%•,  and  Kj  e  Jk}  with  K  =  KiC\Kj  f  0.  Then  by  dehnition 


..v,Kir  4,  _  IlA(.XV)Ylu^v)nK.\KpU[X,ZU')Px^ (dxV)  ^  /1a(XV)£  ,N(V}nK‘''KITlu£N(v)nKj\K  P  e  Fx’,z^XV) 

U  x  7  (A)  — - — - < - 

fY{ll£NWnKi\KPU^zU^xz^xv)  femv)nKi\K\U 


pu{x,zu)  v,K(  J  ,» 
U£N(v)C\Kj\K  £-l  Fx,z  yax  > 


<  E-2[\N(V)nKi\K\+\mV)nKj\K\}  uvjK]  , 


K^W(A) 


<e-4Ap^(d) 

for  A  e  SF.  Alternatively, 


p^(A)>£4Ap^(A) 


noting  that  and  ,'TFj  are  anyway  arbitrary.  Fix  Ac  3F  and  x,x  e  X  and  suppose,  without  loss  of  generality, 
Px,z‘  (A)  ^  p~f  (A).  It  follows 


IpSf1  (A)  -  p~f  (A)  |  =  p£*  (A)  -  p  J  (A)  <  c-4Ap^2  (A)  -  e4Apjf  (A)  <  £-8A|p^  (A)  -  p g2  (A)  | 


using  Lemma[3]  Taking  the  supremum  over  AeST  gives 

ip£?-pg,i^e-8AipS?-#*g,i 

and  from  this  the  result  of  the  proposition  follow  easily. 


□ 


We  state  the  following  lemma  which  bounds  the  change  in  the  correlation  decay  over  a  one-step  applica¬ 
tion  of  the  ideal  cyclically  blocked  filter  for  a  fixed  partition. 

Lemma  4  (OH)).  Suppose  there  exists  e  >  0  such  that  e  <  pv{x,zv)  <  e_1  for  all  v  e  V  and  x,  z  e  X.  For  any 
probability  measure  v,  partition  Jfir(s)  and  sufficiently  small  fi  >  0  such  that 

Corr.^M(v,/i)  <  l/2-(l-eV(r+1)A 


then 

Corr^CF sv,p)  <  2(1  -  e2A)e2^rA2 

for  any  se  N. 

Now  we  are  in  a  position  to  prove  the  time-uniform  bound  on  Corrj^,  (n„,p).  We  have  via  Lemmata 
bound  on  the  change  in  the  correlation  decay  over  any  single  time  step.  We  have  via  PropositionQ]a  relation¬ 
ship  between  the  decay  of  correlations  for  a  measure  under  two  different  partitions.  We  combine  these  results 
and  iterate  to  get  the  desired  time-uniform  bound  as  now  shown. 

Lemma  5.  Assume  e  <  pv(x,zv)  <  c_1  for  all  veV  and  x,z  e  X  with 

e>e0  =  ( l-l/(16A2)p 

Let  pbea  probability  measure  on  X  and  a  partition  ofV  such  that 

s;  1/8 


where  fi  -  -^:log[16A2(l  -  e2A)]  >  0.  Then 


for  all  n>  0. 


Corr^(«)^,/3)  <  1/8 


Proof.  First,  e >  £o  =  (l -  l/(16A2))2a  impliese  8A  <  2 and  (l-e2)e^tr+1) A  <  1/16.  Thus,  given CorrjrCT(0) {p,P)  < 
l/8we  have 

Corrjrff(1)  (p,  p)  <  2Corr^(7(0)  (p,  0)  <  - 

via  Proposition!]]  Now,  given  Corr^a)  lp,P)  <  1/4  <  7/16  <  1/2  -  (1  -  e2)e^tr+1)  we  have 
CorrjrCT(1) [n^,P)  =  Corrjr  (Fpu,/!)  <  2(1  - e2A)e2/3rA2  <  i 

O 

via  Lemma[4]  Just  restart  the  argument  and  iterate  to  get  Corrj^,  (if  ^ ,  (:’>)  <  1  /8  for  any  n  >  0.  □ 

Now  we  have  a  time-uniform  bound  on  Corr,x,r!„:  and  we  can  use  Lemma[2]to  arrive  at  the  result  we 

want  which  is  a  time-uniform  bound  on  the  decay  of  correlation  measure  of  interest. 

Lemma  6  (Bounding  the  Decay  of  Correlations).  Assume  e  <  pv[x,zv )  <  e_1  for  all  veV  and  x,z  eX  with 

£>£0  =  (  1-1/(16A2))^ 

Letf-  -(2r)-1logl6A2(l-£2A)  >0.  Then 

Coning, /3)  <  i 

for  every  n  >  0  and  any  probability  measure  p  on  X. 

Recall  the  program  required  to  prove  the  bias  bound:  1).  Establish  the  local  stability  property  for  the  non¬ 
linear  filter  n„)  and,  2).  establish  the  decay  of  correlations  property  holds  uniformly  in  time  for  the  ideal  cycli¬ 
cal  blocked  filter  n and  then,  3).  establish  a  bound  on  the  one-step  approximation  error  |||  Fniilt_l  -  F n7t^_  ||  / 
that  holds  at  any  time;  and  finally,  4).  put  it  all  together. 

All  that  remains  is  to  show  that  |||  F -  F«7f^_1 1||  j  is  bounded  at  any  time  and  then  to  put  it  all  together. 
Since  this  one-step  approximation  error  is  independent  of  any  change  in  partition  we  refer  back  to  [14] . 

Lemma  7  (Bounding  the  One-Step  Blocking  Approximation  Error  [Ml).  Suppose  there  exists  £  >  0  such  that 
£  <  pv(x,zv )  <  £-1  for  all  veV  and  x,z  e  X.  Let  v  be  a  probability  measure  on  X,  and  suppose  that 

Corr(v,  p)  <  \  -  (1  -  £2)e^(r+1)  A 

for  a  sufficiently  small  constant  /)  >  0.  Then 

sup  lll(Fs<z-(Fs<J|  <4e-^(l-£2A)e-M^(^ 
je.zeX 

for  every  s  e  l\l  and  every  K  e  ,^'ats)  with  veK.  Furthermore, 

III  F„v-  F,jV||| ,  <  4|/|e^(l  -  E^)e-MUW) 


for  every  K  e  and  I Q  K . 

Now  we  are  ready  to  put  all  the  results  together  and  finalise  the  bound  on  the  bias. 

6. 1  Proof  of  Theorem[2] 

Note  7Tq  -  p  and  here  we  consider  the  case  p-  8 lx)  for  some  x  e  X.  Firstly,  note 

£  >  £0  =  (1  —  1  /  (18 A2 )) 1/2A  >  (i  -  1  /  (16A2)) 1/2A 
and  p  -  -(2r)^Mogl6A2(l  -£2A)  >  0  such  that  Lemma[6]holds.  Moreover, 

Corr(7t*,;0)  <  1/3  <  \  -  (1  -  £2)e^(r+1)  A  >  4/9 
for  every  n  >  0  and  Lemma[7]holds.  Finally, 

Corr(7T*,j8)  <  1/3  <  1/2  -  3(1  -  £2A)e2^rA2  >  1/3 

for  every  n  >  0  and  Lemma[l]holds.  Thus,  the  bound  on  the  decay  of  correlations  holds  and  implies  local  filter 
stability  and  the  one-step  blocking  approximation  error  bound  all  hold  under  the  given  parameter  hypotheses. 


Fix  iQr(S)  £  jra(s)  so/c  Ka^  for  all  seN.  Then 


\\nxn  - nxn  11/ <  £  III F„ . . .  Fs+i FsSs_j  Fs+i M*_i || 


5=1 


n— 1 


—  Ill  F n^n-l  F n^n-l  HI /  ^  III  F^ ...  F 5+1  F sfi s_ ^  Fn  . . .  F 5+1 F SJ i s_i  || 

5=1 

Following  [14)  by  application  of  Lemma  [I]  and  Lemma  [6]  we  easily  find 


II *£-*£111/  -  8e  ^d-^2A) 


n— 1 

\j\e-MUMa{,i)}+  Y  e~Pbl-s)  Y  e~ 

S=1  !<£/ 


pd(v,dKaM) 


for  every  n>0,xeX  and  every  Krr(Sj  e  ,A(„(s)  such  that  /  Q  Krr(Si  for  all  s  e  N. 

For  any  site  v  consider  the  sequence  Ka(S)  ( v)  e  ,AA(,{s)  for  all  s  e  M.  The  bound  just  given  becomes 

\\\7lXn-7ZXn\\\v  <  8e-^(l-£2A)y  e-p(ns)e-pd(v,dK^W) 

5=1 

Recall  the  hypothesis  cr(.s)  =  s  (mod  m),  se  N.  Now  averaging  the  bound  gives 


m— 1 


m 


E  \uxn_k-nxn_k\u  <  8^(1-£2A)-  y  e  e 


m— 1 n-k 


-Ptn-k-s)  e~pd{v,dKa(S)(v)) 


k= 0 


m 


k= 0  s=l 


and  we  define  |||  tixs  -  nx_s  |||  v  for  s  e  N  to  be  zero.  Now 


m— 1 n-k 


1 


m— 1 n-k- 1 
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Y  Y  e~^n~k~s) e-Pd^v’dK^W)  _  y  Y 


k= 0  s=l 


m 


e~Ps  e-pd{v,dK^n-k-s)  W) 

k= 0  s=0 
i  m-ln-1 

^  i  E  E  e~PS  e-pd(v,dKa(n-k-s)  W) 

m  k= 0  s=0 


where  now  we  extend  cr(s)  to  all  s  e  Z  so  cr(s)  -  s  (mod  m) .  Then 

i  m-ln-l  i  n— 1  m-1  i  n- 1  m-1 

1  ^  ^  e~ps e~pd(.v,dKain-k-s)(.v))  _  —  V  p-Ps  V  P-PdMKa(n  k-s)m  =  Y  p-ps 


m 


E  E 

k- 0  s=0 


m 


E  ^  E 

s=0  k= 0 


m  “ 


E  e~pS  E 


-pd(v,dKa[k)(  v)) 


s=0  k= 0 


where  we  swapped  the  summation  order  and  used  the  fact  that 


m-1  m-1 

fidiVidKain—ks)  {v))  _  g—fidiVydKfj^tyiv)) 

k=0  k= 0 


for  5  =  0  and  all  s  e  N  because  of  the  cyclical  partitioning  sequence.  Now  it  follows  that 


i  m-1 

-  E  i 

m  k=0 


n-k 


n-k1 


on~&  i  m-1 

ae  ^  „2Aj  1  y  e-pd{v,dKjW ) 

l-e~P  m  jTt j 


8e~P 

1-e-P 


{1-  £2A)'9m{v) 


noting Y.e  ^s<l/(l-e 

ae-P 

ll-e-P) 


P).LetVd{v)± 
■(1  -£2A)flm(tO 


mins  d{v,dKs)  then  the  over  bounding 


(l-e-P) 


(1  -  £2A)  exp 


_pe-P(  ^W-MdW)lgmW 


follows  from  Slater’s  inequality  and  the  proof  is  complete.  □ 

Again,  both  inequalities  in  Theorem|2]imply  that  the  bias  introduced  due  to  blocking  can  be  spatially  aver¬ 
aged  (or  smoothed)  across  a  cyclical  application  of  a  sequence  of  partitions  and  both  inequalities  collapse  to 
the  result  of  IIS)  in  the  case  m-1. 
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